Method and apparatus for extracting journey of life attributes of a user from user interactions

ABSTRACT

Embodiments of the invention relate to managing user interactions and, more particularly, to performing analysis on data generated by user interactions. Embodiments of the invention use text mining to extract personal information of users from user interactions automatically. A topic model is used to reduce the number of dimensions required to represent the text, yet all the information of interest is highly pronounced. This enables a lower dimensional representation of the data leading to significantly faster computations.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. provisional patent applicationSer. No. 61/814,011, filed Apr. 19, 2013, and is a continuation-in-partof U.S. patent application Ser. No. 14/161,071, filed Jan. 22, 2014,which application claims priority to U.S. provisional patent applicationSer. No. 61/755,868, filed Jan. 23, 2013, and to U.S. provisional patentapplication Ser. No. 61/769,067, filed Feb. 25, 2013, each of whichapplication is incorporated herein in its entirety by this referencethereto.

FIELD

The invention relates to managing user interactions. More particularly,the invention relates to performing analysis on data generated by userinteractions.

BACKGROUND

People often require assistance when performing certain tasks. They mayseek assistance from agents, where they interact with the agents using amedium such as textual chats; voice chats, e.g. over a telephonenetwork, a cellular network, a Voice over Internet Protocol (IP) (VoIP)network, etc.; an online forum; a social network; and so on. Suchassistance may be requested in connection with purchasing specificitems, inquiring about items, troubleshooting issues they face, and soon.

In these interactions, individuals might share information related totheir personal life with the agents. It would be advantageous if suchinformation could be used to understand the persona of the individualbetter and build a profile of them, for example to tailor theinteractions and/or services and products which may be offered to theindividual.

SUMMARY

Embodiments of the invention relate to managing user interactions and,more particularly, to performing analysis on data generated by userinteractions. Embodiments of the invention use text mining to extractpersonal information of users from user interactions automatically. Atopic model is used to reduce the number of dimensions required torepresent the text, yet all the information of interest is highlypronounced. This enables a lower dimensional representation of the dataleading to significantly faster computations.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block schematic diagram showing an architecture used toextract personal information about a user from user interactionsaccording to the invention;

FIG. 2 is a block schematic diagram showing an analyzer configured forautomatically extracting personal information of users from userinteractions using a text-mining method according to the invention;

FIG. 3 is a flow diagram showing a process for topic modeling accordingto the invention;

FIG. 4 is a flow diagram showing a process for automatically extractingpersonal information of users from user interactions using a text-miningmethod according to the invention;

FIG. 5 is an example of identification of user's specific personaltopics based on chat data according to the invention;

FIG. 6 is a block schematic diagram showing the identification of auser's personal information topic based on text data according to theinvention;

FIG. 7 is a block schematic diagram showing user identificationaccording to the invention; and

FIG. 8 is a block schematic diagram showing a machine in the exampleform of a computer system within which a set of instructions for causingthe machine to perform one or more of the methodologies discussed hereinmay be executed.

DETAILED DESCRIPTION

Embodiments of the invention relate to managing user interactions and,more particularly, to performing analysis on data generated by userinteractions. Embodiments of the invention use text mining to extractpersonal information of users from user interactions automatically. Atopic model is used to reduce the number of dimensions required torepresent the text, yet all the information of interest is highlypronounced. This enables a lower dimensional representation of the dataleading to significantly faster computations.

FIG. 1 is a block schematic diagram showing an architecture used toextract personal information about a user from user interactionsaccording to the invention. The architecture comprises a network 14 thatallows at least one user 12 to interact with at least one agent 13. Thenetwork may be any of an Internet protocol (IP) based network; atelephone network, such as a public switched telephone network (PSTN); amobile technology based network; and so on. The interaction between theagent and the user may be any of a text chat based interaction; avoice-based chat, such as a VoIP based service; a voice-basedinteraction, such as performed over a telephone or a cellular network; asocial network based interaction; a forum based interaction; and so on.

An analyzer, connected to the network, extracts the interaction betweenthe user and agent from the network, e.g. chat text, etc. for a currentinteraction is ported to the analyzer which consumes the data. While theinteraction happens, each part of the interaction, i.e. a line of chator a specific instance of single speech utterance, etc. is stored in acentralized data store system to which all data generated from multiplesystems, including browsing behavior, call flows, system state changes,etc. is stored.

Further, the store can include a complete interaction, such as acomplete chat or voice interaction between an agent and a caller,instead of a portion of an interaction at a granular level, as discussedabove. If the data exists in disparate sources, application ofappropriate processes and technology consolidates the interaction datainto a single repository, such as a virtual repository, i.e. a set ofmultiple repositories or an actual single schema on a single set ofservers.

While storing the information, appropriate labels and/or keys areattached with interaction specific information which identifies what thedata means. The analyzer crawls through all of the available data and,based on the labels and/or keys, extracts the required interaction data.

In an embodiment of the invention, the analyzer is connected to theagent and fetches the interaction from the agent. During an interactionor just after an interaction, the agent or the system handling theinteraction can make appropriate annotations and/or comments through anyof various mechanisms, such as through a post-interaction survey,wrap-up forms, interaction notes, etc. These entries into the system canalso be appropriately configured to flow into the single repository.This data is used to label the interaction appropriately, e.g. if thechat is related to a person shifting homes or graduating, etc. Theinteraction systems themselves can make such labeling much easier, forexample, by providing a highlighting tool to the agent, whereappropriate sections in the chat are highlighted and, on highlighting,the highlighted information is stored as part of the wrap-up form, withappropriate labels and/or keys.

Further, the capability of the analyzer module can be extended such thatentry of data into the system by the agent is more convenient. Forexample, the analyzer can identify appropriate key words or utteranceswhen they occur during an interaction, i.e. in real time, and ask theagent if this is appropriate information for the particular interaction.The agent can confirm that the words are appropriate information or theagent can decide to ignore the suggestion if the words are notappropriate information. In another embodiments of the invention, theagent is given a selection of a drop down from which appropriateinformation can be selected and associated with the chat duringcompletion of the form itself.

Once the agent enters information during or after the interaction, dataflows into the central data repository. By design of the datarepository, this data element is associated with an appropriate key. Theanalyzer can then look through all of the data and extract thisinformation as appropriate.

In embodiments of the invention, the analyzer fetches the interaction inthe form of text. If the interaction is voice-based, the interaction istranscribed into text and provided to the analyzer. Customer interactionwith the system can be speech based, for example when the user calls andinteracts with the IVR system through a direct or open dialog; when theuser talks to an agent; when the user talks to a self-serve tool whichrecognizes the speech and helps accordingly; as part of a voice-basedsearch during browsing; and so on. In such cases, while embodiments ofthe invention provide for performing extraction of the requiredinformation directly on voice-based data to understand customer'scontext, embodiments of the invention can extract information from thetranscript data which in form of text. As used herein, the termtranscript refers to text data that is obtained by converting the speechinteraction via an appropriate automatic speech recognition (ASR)engine. The ASR engine can be any high accuracy system which takesspeech data and outputs text data in text format that is reflective ofthe speech input. Those skilled in the art will appreciate that suchapproach does not rule out the possibility of inclusion of speech-baseddata extraction.

In another embodiment of the invention, the analyzer fetches theinteraction from a storage medium, such as a server and/or database inwhich the interactions are stored.

In machine learning and natural language processing, a topic model is atype of statistical model for discovering the abstract topics that occurin a collection of documents. Intuitively, given that a document isabout a particular topic, one would expect particular words to appear inthe document more or less frequently: “dog” and “bone” appear more oftenin documents about dogs, “cat” and “meow” appear in documents aboutcats, and “the” and “is” appear equally in both. A document typicallyconcerns multiple topics in different proportions; thus, in a documentthat is 10% about cats and 90% about dogs, there would probably be aboutnine times more dog words than cat words. A topic model captures thisintuition in a mathematical framework, which allows examining a set ofdocuments and discovering, based on the statistics of the words in each,what the topics might be and what each document's balance of topics is.

The analyzer performs topic modeling by extracting lines from the text.In embodiments of the invention, the extracted lines are referred toherein as anchored text lines and are indicative of personal informationthat is present in the anchored text lines (see FIG. 5). The analyzeridentifies the anchored text lines by checking for specific keywordswhich may be present when a user is mentioning personal information. Theanalyzer applies a suitable machine learning or statistical technique,such as k nearest neighbor (K-NN Classifier) or a Naïve Bayes classifierto the anchored text lines to discover information present in theanchored text lines.

The data required to identify the personal information is gathered fromvarious sources, such as user's past interactions or user profile and/orfrom current interactions, e.g. as shown FIG. 5 where a user is chattingwith the chat agent. From such user-agent dialog, the text data reveals,for example, that the user is requesting an increase in his credit limitto meet his daughter's marriage expenses. The analyzer fetches texts andassigns the text, if possible, to a specific personal information topic,such as user's marital status, e.g. whether the user is newly married ordivorced (see FIG. 6, discussed below).

In an embodiment of the invention, a user profile is continuouslygenerated by evaluating a plurality of different sets of data collectedacross a plurality of channels, multiple data sources, and uniqueidentifiers comprising all of unique data which corresponds to uniqueidentification parameters of the user, aggregate data, transaction data,and interaction data. The profile includes information that uniquelyidentifies a user as well as the user's previous interaction experienceand personal information which is used to classify the user. In thisway, the user profile is continuously updated with information generatedin accordance with the invention disclosed herein. Further details onprofiles are found in U.S. patent application Ser. No. 14/161,071, filedJan. 22, 2014, which application is incorporated herein in its entiretyby this reference thereto.

In a preferred embodiment of the invention, the analyzer uses the knearest neighbor approach. In pattern recognition, the k-NearestNeighbors algorithm (k-NN) is a non-parametric method used forclassification and regression. In both cases, the input consists of thek closest training examples in the feature space. The output depends onwhether k-NN is used for classification or regression:

-   -   In k-NN classification, the output is a class membership. An        object is classified by a majority vote of its neighbors, with        the object being assigned to the class most common among its k        nearest neighbors, where k is a positive integer, typically        small. If k=1, then the object is simply assigned to the class        of that single nearest neighbor.    -   In k-NN regression, the output is the property value for the        object. This value is the average of the values of its k nearest        neighbors.

k-NN is a type of instance-based learning, or lazy learning, where thefunction is only approximated locally and all computation is deferreduntil classification. The k-NN algorithm is among the simplest of allmachine learning algorithms. Both for classification and regression, itcan be useful to weight the contributions of the neighbors, so that thenearer neighbors contribute more to the average than the more distantones. For example, a common weighting scheme consists in giving eachneighbor a weight of lid, where d is the distance to the neighbor. Theneighbors are taken from a set of objects for which the class, for k-NNclassification; or the object property value, for k-NN regression, isknown. This can be thought of as the training set for the algorithm,though no explicit training step is required.

In embodiments of the invention, when the analyzer is given a new datapoint, which may be an anchored text line from the text, it picks the kclosest points to the new data point, determines the predominant classamong the classes in the k closest points, and then assigns it to thenew data point. The analyzer then assigns the user corresponding to thetext to at least one class, based on the identified personalinformation. For example, as shown in FIGS. 5 and 6, if the textcorresponding to a user's chat dialogs identifies that the user isrecently married (see FIG. 5, discussed below), the user is placed inthe class belonging to recently married users. The analyzer assignstexts a value of ‘No Segment’ class if personal information cannot beextracted from the text.

In embodiments of the invention, the analyzer associates anidentification with a user (see FIG. 7 discussed below). Theidentification is, for example, a user ID associated with the user, aphone number, an email address, and so on, as illustrated in FIG. 7.Once the user has been identified, the analyzer also tailors theinteractions with the user based on the identified personal information.

In embodiments of the invention, the analyzer performs the tasks ofusing the extracted personal information. However, it will be obvious toa person of ordinary skill in the art that a different module mayinterface with the analyzer to perform the task of using the extractedpersonal information in a suitable manner, as explained in greaterdetail below.

Consider, for example, where a user is classified as engaged. In suchcase, the user may be interested in looking for houses, wedding relatedgifts, honeymoon packages, and so on. The analyzer pushesrecommendations to the user accordingly. The recommendations may be inthe form of a campaign comprising of any of emails, phone calls, onlineadvertisements, tips to agents interacting with the user, and so on.

Consider, for another example, where a user has been classified ashaving recently purchased a house. In such case, the analyzer providesrecommendations to the user related to furnishings, interior decorators,home decor tips, and so on.

FIG. 2 is a block schematic diagram showing an analyzer configured forautomatically extracting personal information of users from userinteractions using a text-mining method according to the invention. Inembodiments of the invention, the analyzer comprises a classifier 21, amodeling engine 22, an interface 23, and a database 24. The modelingengine and the classifier may store data in the database atpre-configured intervals or at pre-configured stages. The database maybe present internal to the analyzer. In another embodiment of theinvention, the database may be present external to the analyzer. Theinterface enables the analyzer to connect to the network. In anembodiment of the invention, the interface connects the analyzer to theagent.

The modeling engine 22 performs topic modeling on interactions which arein the form of text. The modeling engine extracts anchored text linesfrom the text. The modeling engine identifies the anchored text lines bychecking for specific keywords which may be present when a user ismentioning personal information. The modeling engine applies a suitablestatistical technique to the anchored text lines to discover informationpresent in the anchored text lines. The modeling engine represents thetext in topic space with a score along each axis, which indicates theextent to which the text contains personal information.

Based on the modeling performed by the modeling engine, the classifier21 fetches text and assigns the text, if possible, to a specificpersonal information topic. In embodiments of the invention, theclassifier uses the k nearest neighbor approach. In embodiments of theinvention, when the classifier is given a new data point, which may bean anchored text line from the text, it picks the k closest points tothe new data point, determines the predominant class among the classesin the k closest points, and then assigns it to the new data point. Theclassifier then assigns the user corresponding to the text to at leastone class, based on the identified personal information. The classifierassigns texts a value of ‘No Segment’ class, if personal information maynot be extracted from the text.

In embodiments of the invention, the classifier associates anidentification with a user (see FIG. 6, discussed below). Theidentification is, for example, a user ID associated with the user, aphone number, an email address, and so on. Once the user has beenidentified, the classifier also tailors the interactions with the user,based on the identified personal information.

FIG. 3 is a flow diagram showing a process for topic modeling accordingto the invention. The analyzer fetches (301) an interaction. Theanalyzer may extract the interaction from the network. In an embodimentof the invention, the analyzer is connected to the agent and fetches theinteraction from the agent.

The analyzer fetches the interaction, for example, in the form of text.If the interaction is voice-based, the interaction is transcribed intotext and provided to the analyzer. In an embodiment of the invention,the analyzer fetches the interaction from a storage medium, such as aserver and/or database where the interactions are stored.

The analyzer also performs topic modeling by extracting (302) anchoredtext lines from the text. The analyzer identifies the anchored textlines by checking for specific keywords which may be present when a useris mentioning personal information. The analyzer discovers (303)information present in the anchored text lines by applying a suitablestatistical technique to the anchored text lines. The analyzerrepresents (304) the text in topic space with a score along each axis toindicate extent to which the text contains the personal information.

Embodiments of the invention use a topic model to reduce the number ofdimensions required to represent the text, but all of the information ofinterest is highly pronounced. Usually, a text or a document consists ofseveral sentences, such as in the chat in the dialogs shown in FIG. 5.For example, any piece of a text document can be represented at thetopics level instead of the word level. This allows two text documentsthe do not share common words, e.g. one says “buy” and the other says“sell,” to be regarded as similar because they share the same financetopic. This allows a lower dimensional representation of the data, whichleads to significantly faster computation. For this purpose, tools suchas principal component analysis, latent sematic analysis, orprobabilistic latent semantic analysis are used to reduce thedimensionality to the represent the text in lower dimensions.

The various actions (300) shown in FIG. 3 may be performed in the orderpresented, in a different order, or simultaneously. Further, in someembodiments of the invention, some actions shown in FIG. 3 may beomitted.

FIG. 4 is a flow diagram showing a process for automatically extractingpersonal information of users from user interactions using a text-miningmethod according to the invention. The analyzer fetches (401) texts andassigns (402) the text, if possible, to a specific personal informationtopic. Based on the identified personal information, the analyzerassigns (403) the user corresponding to the text to at least one class.

The various actions (400) shown in FIG. 4 may be performed in the orderpresented, in a different order, or simultaneously. Further, in someembodiments of the invention, some actions shown in in FIG. 4 may beomitted.

FIG. 5 is an example of identification of user's specific personaltopics based on chat data according to the invention. In FIG. 5, aclassification 50 a-55 a is derived from a corresponding transcriptcontaining personal information 50 b-55 b that is extracted from auser-agent interaction.

FIG. 6 is a block schematic diagram showing the identification of auser's personal information topic based on text data according to theinvention. In FIG. 6, training chats 62 are processed to extractanchoring keywords 63, resulting in anchored chats 64. Chats with labels66 are transformed to a trained topic space 67 and, using the anchoredchats for topic modeling 69, the chats are transformed in a lowdimensional space to chats that are expressive of personal information68. A multi-class classifier 70 is then applied to the chats andpersonal traits of the user 72 are identified. In embodiments of theinvention, a K-NN classifier, Naïve Bayes classifier, and other suchsimilar machine learning modeling tools can be used to classify the userinto one of the predefined classes such as “Just married,” “Joinedcollege,” “ Graduated recently,” and so on.

FIG. 7 is a block schematic diagram showing user identificationaccording to the invention. In FIG. 7, user interactions are captured74, for example via any of chat, voice calls, etc. and the systemcollects email, phone, or any other personal credential to authenticatethe user to the user profile data base 76. The user profile containsinformation regarding the user's personal credentials, such as email ID,phone number, or other such information, along with an aggregated pastinteraction history for the user 77. As a result, the user ID isassigned 78, where a new user ID is allotted if the user is notauthenticated 78. A simple look-up is performed to check whether theuser's credentials match those stored in the users' database or profile.If not, then a new User ID is assigned.

Computer Implementation

FIG. 8 is a block diagram of a computer system that may be used toimplement certain features of some of the embodiments of the invention.The computer system may be a server computer, a client computer, apersonal computer (PC), a user device, a tablet PC, a laptop computer, apersonal digital assistant (PDA), a cellular telephone, an iPhone, aniPad, a Blackberry, a processor, a telephone, a web appliance, a networkrouter, switch or bridge, a console, a hand-held console, a (hand-held)gaming device, a music player, any portable, mobile, hand-held device,wearable device, or any machine capable of executing a set ofinstructions, sequential or otherwise, that specify actions to be takenby that machine.

The computing system 40 may include one or more central processing units(“processors”) 45, memory 41, input/output devices 44, e.g. keyboard andpointing devices, touch devices, display devices, storage devices 42,e.g. disk drives, and network adapters 43, e.g. network interfaces, thatare connected to an interconnect 46.

In FIG. 8, the interconnect is illustrated as an abstraction thatrepresents any one or more separate physical buses, point-to-pointconnections, or both connected by appropriate bridges, adapters, orcontrollers. The interconnect, therefore, may include, for example asystem bus, a peripheral component interconnect (PCI) bus or PCI-Expressbus, a HyperTransport or industry standard architecture (ISA) bus, asmall computer system interface (SCSI) bus, a universal serial bus(USB), IIC (12C) bus, or an Institute of Electrical and ElectronicsEngineers (IEEE) standard 1394 bus, also referred to as Firewire.

The memory 41 and storage devices 42 are computer-readable storage mediathat may store instructions that implement at least portions of thevarious embodiments of the invention. In addition, the data structuresand message structures may be stored or transmitted via a datatransmission medium, e.g. a signal on a communications link. Variouscommunications links may be used, e.g. the Internet, a local areanetwork, a wide area network, or a point-to-point dial-up connection.Thus, computer readable media can include computer-readable storagemedia, e.g. non-transitory media, and computer-readable transmissionmedia.

The instructions stored in memory 41 can be implemented as softwareand/or firmware to program one or more processors to carry out theactions described above. In some embodiments of the invention, suchsoftware or firmware may be initially provided to the processing system40 by downloading it from a remote system through the computing system,e.g. via the network adapter 43.

The various embodiments of the invention introduced herein can beimplemented by, for example, programmable circuitry, e.g. one or moremicroprocessors, programmed with software and/or firmware, entirely inspecial-purpose hardwired, i.e. non-programmable, circuitry, or in acombination of such forms. Special-purpose hardwired circuitry may be inthe form of, for example, one or more ASICs, PLDs, FPGAs, etc.

Although the invention is described herein with reference to thepreferred embodiment, one skilled in the art will readily appreciatethat other applications may be substituted for those set forth hereinwithout departing from the spirit and scope of the present invention.Accordingly, the invention should only be limited by the Claims includedbelow.

The invention claimed is:
 1. A computer implemented method forperforming analysis on data generated by user interactions, comprising:providing a processor executing instructions for receiving textinformation from at least one interaction between a user and an agent;said processor text mining said interaction information to extractpersonal information relating to said user automatically; said processorusing a topic model to extract lines from said text information toreduce a number of dimensions required to represent the text, whereinall information of interest is highly pronounced, and wherein aresulting lower dimensional representation of the text allowssignificantly faster computations; said processor extracting said linesof text as anchored text lines that are indicative of personalinformation that is present in the anchored text lines; said processoridentifying said anchored text lines by checking for specific keywordswhich are present when said user is mentioning personal informationduring said interaction; said processor applying a statistical techniqueto said anchored text lines to discover information present in theanchored text lines; said processor using a k nearest neighbor algorithmto discover said information present in the anchored text lines; saidprocessor representing said text in a topic space with a score alongeach axis to indicate an extent to which said text contains personalinformation about said user; upon receiving a new data point, whichoptionally comprises an anchored text line from said text, saidprocessor picking the k closest points to said new data point,determining a predominant class among classes in the k closest points,and assigning said predominant class to said new data point; saidprocessor assigning a user corresponding to said text to at least oneclass based on said identified personal information; and said processorclassifying said user based upon said extracted lines of text.
 2. Themethod of claim 1, further comprising: said processor extracting saidinteraction information from any of a network and said agent.
 3. Themethod of claim 1, further comprising: said processor initiallytranscribing said interaction into text.
 4. The method of claim 1,wherein said personal information topic space comprises any of maritalstatus, age, date of birth, travel plans, anniversary date, preferredbrands, family related information, financial information, level ofeducation, vehicles owned, location, health related information, levelof familiarity with a specific area, and price consciousness.
 5. Themethod of claim 1, further comprising: said processor gathering datarequired to identify said personal information from any of said user'spast interactions, a user profile, and from current interactions.
 6. Themethod of claim 5, further comprising: said processor continuouslygenerating said user profile by evaluating a plurality of different setsof said data collected across a plurality of channels, multiple datasources, and unique identifiers comprising all of unique data whichcorresponds to unique identification parameters of the user, aggregatedata, transaction data, and said interaction data, said profileincluding information that uniquely identifies a user as well as theuser's previous interaction experience and personal information which isused to classify the user.
 7. The method of Claim 1, further comprising:said processor assigning texts a value of ‘No Segment’ class if personalinformation about the user cannot be extracted from said text.
 8. Themethod of claim 1, further comprising: said processor associating anidentification with said user.
 9. The method of claim 8, wherein saididentification comprises any of a user ID associated with the user, aphone number, and an email address.
 10. The method of claim 8, furthercomprising: once said user has been identified, said processor tailoringinteractions with the user based on the identified personal information.11. An apparatus for performing analysis on data generated by userinteractions, comprising: a processor executing instructions forreceiving text information from at least one interaction between a userand an agent; said processor text mining said interaction information toextract personal information relating to said user automatically; saidprocessor using a topic model to extract lines from said textinformation to reduce a number of dimensions required to represent thetext, wherein all information of interest is highly pronounced, andwherein a resulting lower dimensional representation of the text allowssignificantly faster computations; said processor extracting said linesof text as anchored text lines that are indicative of personalinformation that is present in the anchored text lines; said processoridentifying said anchored text lines by checking for specific keywordswhich are present when said user is mentioning personal informationduring said interaction; said processor applying a statistical techniquesaid anchored text lines to discover information present in the anchoredtext lines; said processor using a k nearest neighbor algorithm todiscover said information present in the anchored text lines; saidprocessor representing said text in a topic space with a score alongeach axis to indicate an extent to which said text contains personalinformation about said user; upon receiving a new data point, whichoptionally comprises an anchored text line from said text, saidprocessor picking the k closest points to said new data point,determining a predominant class among classes in the k closest points,and assigning said predominant class to said new data point; saidprocessor assigning a user corresponding to said text to at least oneclass based on said identified personal information; and said processorclassifying said user based upon said extracted lines of text.